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Abstract 

We discuss various aspects of the statistical formulation of the theory of random 
graphs, with emphasis on results obtained in a series of our recent publications. 
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The backbone of a generic complex system can be represented by a graph: the 
nodes refer to the system subunits and the links represent the interactions. 
Such a representation can be used, for example, to describe an ensemble of 
agents interacting in an economical framework or a telecommunication net- 
work. Other examples of natural networks can be found in recent reviews [1]. 

In this communication we will discuss a statistical formulation of the graph 
theory [2,3,4,5]. The basic concept in this approach is that of a statistical 
ensemble: a configuration space (set of graphs) is endowed with a probability 
measure. Instead of measuring an observable on a single graph one measures 
its average in the ensemble. One learns about stability and typicality of graphs 
by measuring fluctuations in the ensemble. This synchronic approach is com- 
plementary to the diachronic one, where graphs are made up step by step, by 
adding successive nodes and links. 

The first step of the construction is to define the configuration space by choos- 
ing the set of graphs to be studied. Of course, there is much freedom in this 
choice. Here, for definiteness and unless specified otherwise, we will consider 
simple, undirected, labeled graphs with N vertices and L links. Graphs are, 
in general multiply connected. 
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It is convenient to label the nodes of a graph by an index i = 1, . . . , N. A 
labeled graph can be represented by an adjacency matrix A whose elements 
are A; L j = 1 if i and j are neighbors and Ay = otherwise. 

The adjacency matrix of an unoriented graph with N nodes and L links is 
an N x N symmetric matrix with zeros on the diagonal and L unities above 
(and below) the diagonal. Denote by the set of all N x N symmetric 0/1 
matrices with zeros on the diagonal and by Mnl the subset where in addition 
the number of unities above the diagonal equals L. The partition function 
for the Erdos-Renyi [6] ensemble of graphs with iV nodes and L links can be 
written, up to some irrelevant normalization factor, as: 

Z= £ 5(L-\t^)= £ 1 (1) 

AeM N \ * / AeM NL 

Let O = 0(A) be an observable, a quantity defined on graphs. One is interested 
in the average over the ensemble 

<0> = 7 E O(A) (2) 

and in the fluctuations: (O 2 ) — (O) 2 and the higher order ones, in the large N 
limit. The number of unities in a row % of the matrix A: qi = J2j Aij is equal to 
the number of links emerging from the node i, and is called the node degree. 
The probability that a randomly chosen node of the graph has degree q is 

P(?) = ^E (3) 

It is not very difficult to calculate p(q) in the iV — > oo limit starting from (1). 
The result is Poissonian 

p(q) = ^e- (4) 

where the constant a is determined by the ratio L/N, kept fixed in the limit 2 : 
a = (q) p = 2L/N. However, in most interesting real networks the degree dis- 
tribution is skew. In the so-called scale-free networks it has a fat tail extending 
over several decades and well fitted with a power law: p e xp(<?) ~ <? _/3 • The most 
conservative extension of the Erdos-Renyi ensemble consists in introducing an 
additional statistical weight for graphs, replacing the partition function 

(1) by [2,3]: 

Z= E W(A) (5) 

AeM NL 



2 Notation: given a positive measure and an observable depending on node degrees, 
say w(q) and 0(q), we write (0) w for J2 q °(q)w(q)/ J2 q w (l)- 



The corresponding averages read now 



(0> = ± £ W(A)0(A) (6) 
^ AeM NL 

The simplest choice for the statistical weight is: 

W(A) = w(q 1 )w(q 2 ) . . . w(q N ) (7) 
In the N — > oo limit one gets now 

P(g) = ^exp(Ag-5) (8) 

where the parameters A, B are chosen in such a way that Y^,q p(q) = 1 and 
(q)p = J2qQP(l) — 2L/N. Moreover, the probability that a randomly chosen 
graph has degrees q±, q 2 , . . . , qN, factorizes: 

p(Qi,Q2,---,Qn) = P(li)p(<l2) ■ ■ - p(qN) (9) 

This is why one refers to this model as to the model of "uncorrelated net- 
works". For scale-free networks the asymptotic results (8)-(9) are only partly 
true because finite-size effects strongly affect the tail of the degree distribu- 
tion: First of all, at finite N the tail of p(q) cannot extend to infinity because 
there exists some q max such that the expected number of nodes with q > g max 
is less than unity. Neglecting correlations one finds the scaling law 

g max ~ iV^-D (io) 

Furthermore, as shown in [3], the condition that the graphs are simple, i.e. 
self and multiple connections between nodes are absent, implies that 

gmax ~ N 1 ' 2 (11) 

which for 2 < (3 < 3 is stronger than (10). Hence, in this case, not only the 
degree distribution is cut but also specific correlations are generated at finite 
N. 



By choosing the appropriate weight function w(q) for uncorrelated networks 
one can reproduce the experimentally observed degree distribution, modulo 
the above mentioned finite-size effects. We have constructed a numerical al- 
gorithm, enabling one to simulate the model on a computer. We have also 
obtained some further analytic results. 

In particular, the model is analytically solvable, when one restricts one's at- 
tention to tree graphs. In this case (q) p = 2, since L — N — 1. Assume that 
w(q) ~ g -7 at large q. What is the shape of the degree distribution? The gen- 
eral result (8) no longer holds, because trees constitute a negligible fraction of 



all possible graphs. It turns out that an interesting phase structure emerges 
[2]: 



(a) When (q) w = 2 the trees are scale-free, with the degree distribution equal 
to qw(q), up to normalization. 

(b) When {q) w < 2 one finds that the degree distribution is up to normalization 
equal to qw(q) for most of the range of q, a singular node showing up at q of 
the order of N. The winner-takes-all scenario holds. 

(c) When (q) w > 2 the degree distribution falls exponentially, the scale-free 
input is forgotten. 

Notice, that the scale-free regime is unstable with respect to small distortions 
of the input weight. Further information is obtained when one calculates the 
fractal dimension du controlling the average shortest path r between a pair of 
nodes 



It is known that the generic intrinsic fractal dimension of trees is d H = 2. This 
is also what one finds in the case (c) and in the case (a) when p(q) falls faster 
than q~ 3 , i.e. when f3 > 3. When 2 < (3 < 3 



In the case (b) du = oo. 

It is interesting to keep the same microstate weights as before, but assume 
that trees are endowed with a causal structure [4]. We say that this is the 
case when the node labels always appear in growing numerical order as one 
moves along the tree from the root - we have rooted trees in mind - towards 
an arbitrary node. Hence only a subclass of labelings is accepted. It turns out 
that the most popular growing network models can be reformulated in this 
static formalism. The original results are recovered in an elegant fashion. This 
shows that the widely accepted distinction between growing and equilibrium 
networks is not really correct, the two approaches are just complementary. 
Among new results is the calculation of the fractal dimension: Remarkably 
enough, we find that it is generically infinite, du = oo, in contrast to what 
happens in maximally random trees (see above). 

Uncorrelated networks have a local tree structure. This is a well known fact 
in the context of the Erdos-Renyi theory. The same arguments hold in the 
generalized set-up. This tree structure persists when simple internode corre- 
lations are introduced. Actually, a general recipe generating short loops in 
static graph models was missing in the literature, until recently. We have suc- 
ceeded to make a progress in this matter [5,7]. One should mention that short 



(r) ~ N 1/d 



■H 




loops are a common feature of natural networks. In particular, the clustering 
coefficient is relatively large. 



The clustering coefficient for a given vertex % is defined as Cj = — hrj^, 
where Tj is number of triangular loops, called also three-cycles, meeting at 
i. The clustering coefficient of a graph is just the average of Ci over nodes. 
The reason why the coefficient is small for uncorrelated graphs is that the 
number of three-cycles T = |trA 3 is small. One can show that for the Erdos- 
Renyi graphs the total number of three-cycles approaches a fixed constant, 
(T) = a 3 /6 for N — > oo. A similar result holds for cycles with a larger number 
of links. Hence, the chance of finding a cycle on a large network is close to 
zero. This is a manifestation of the local tree structure of graphs. 

We consider therefore a generalized model for graphs by adding to the Hamil- 
tonian an interaction term favoring the formation of three-cycles. Hence, the 
microstate weights (7) are modified as follows 3 : 

W(A) -> W{A) exp Q^trA 3 ) (14) 

The resulting model has two phases: the crumpled and the perturbative one. 
The crumpled phase is dominated by graphs which maximize the number of 
three-cycles, which is of order N 3 ^ 2 . For any G > and for large N the term 
~ GN 3 / 2 in the exponent exceeds the entropy [8]. Thus the corresponding 
configuration plays the role of the ground state of the model for any G > 0. 
The perturbative phase is obtained by letting the interaction Hamiltonian 
to act softly on the uncorrelated graphs (G = 0). We have developed the 
corresponding perturbation theory. It turns out that the two phases are sepa- 
rated by a free energy barrier similar to that in a first order phase transition. 
Here however the barrier has an additional important feature. When N — > oo 
the barrier and the stability range of the perturbative phase increase. This 
means that at large N a random walker, representing a local process in the 
configuration space, will never be able to roll over the barrier and to reach 
the ground state. When the uncorrelated graphs are those of Erdos-Renyi the 
value G = G ou t of the coupling constant where the system jumps to the crum- 
pled phase scales logarithmically with N: G out = x out In N . By summing the 
leading diagrams we get the number of three-cycles in the perturbative phase, 
G < G out : 

(T) = ^e G = ^N* (15) 

where the effective coupling constant x < x ou t is defined by G = x In N. Thus, 
in this new theory the number of three-cycles grows with N. It is a first step 
towards a theory of graphs with a non-trivial clustering. 



3 Actually, in ref. [5] the unperturbed model is that of Erdos-Renyi (W(A) = 1), 
the general case will be discussed in our forthcoming publication [7]. 



The random graph theory formulated in the language of statistical mechanics 
can be studied using the dynamical Monte-Carlo techniques [2,3]. Actually, 
all our analytic results were checked and confirmed by such numerical simu- 
lations. The idea behind the Monte Carlo technique is to invent a Markovian 
process performing a sort of random walk in the configuration space and sam- 
pling configurations with the frequency proportional to W(A). If the process 
is ergodic and the transition probability fulfills the detailed balance condi- 
tion the process generates configurations with the required frequency. A good 
candidate for such a process is a sequence of rewirings performed with the 
Metropolis probability. Naively, to encode a N x N adjacency matrix one re- 
quires the quadratic (A 2 ) storage capacity. However, since the matrix is sparse 
and only the positions of L it's elements are relevant, one can introduce a lin- 
ear storage structure [2] which in practice allows one to code networks with 
up to 10 6 — 10 7 nodes. Finally, let us mention that our code not only produces 
graphs but also simulates a thermal motion. This was important in refs. [5,7]. 
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